Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance
نویسندگان
چکیده
The performance of text classification methods has improved greatly over the last decade for instances less than 512 tokens. This limit been adopted by most state-of-the-research transformer models due to high computational cost analyzing longer instances. To mitigate this problem and improve texts, researchers have sought resolve underlying causes proposed optimizations attention mechanism, which is key element every model. In our study, we are not pursuing ultimate goal long classification, i.e., ability analyze entire at one time while preserving a reasonable cost. Instead, propose truncation method called Text Guide, in original length reduced predefined manner that improves naive semi-naive approaches low costs. Guide benefits from concept feature importance, notion explainable artificial intelligence domain. We demonstrate can be used recent language specifically designed such as Longformer. Moreover, discovered parameter optimization must conducted before deployed. Future experiments may reveal additional provided new method.
منابع مشابه
A Novel One Sided Feature Selection Method for Imbalanced Text Classification
The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...
متن کاملThe Feature Selection Method based on Genetic Algorithm for Efficient of Text Clustering and Text Classification
Big Data means a very large amount of data and includes a range of methodologies such as big data collection, processing, storage, management, and analysis. Since Big Data Text Mining extracts a lot of features and data, clustering and classification can result in high computational complexity and the low reliability of the analysis results. In particular, a TDM (Term Document Matrix) obtained ...
متن کاملthe impact of skopos on syntactic features of the target text
the present study is an experimental case study which investigates the impacts, if any, of skopos on syntactic features of the target text. two test groups each consisting of 10 ma students translated a set of sentences selected from advertising texts in the operative and informative mode. the resulting target texts were then statistically analyzed in terms of the number of words, phrases, si...
15 صفحه اولAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2021.3099758